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Random number generators are widely used in practical algorithms. Examples include 
CNl . simulation, number theory (primality testing and integer factorization) , fault tolerance, rout- 

ing, cryptography, optimization by simulated annealing, and perfect hashing. 

■ Complexity theory usually considers the worst-case behaviour of deterministic algorithms, 
^"*^ \ but it can also consider average-case behaviour if it is assumed that the input data is drawn 

randomly from a given distribution. Rabin popularised the idea of "probabilistic" algorithms, 
(^) \ where randomness is incorporated into the algorithm instead of being assumed in the input 

data. Yao showed that there is a close connection between the complexity of probabilistic 
algorithms and the average-case complexity of deterministic algorithms. 

We give examples of the uses of randomness in computation, discuss the contributions of 
Rabin, Yao and others, and mention some open questions. 
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> : 

1 Checking out Galileo 

The Galileo spacecraft is somewhere near Jupiter, but its main radio antenna is not working, 
so communication with it is very slow. Suppose we want to check that a critical program in 
Galileo's memory is correct, and has not been corrupted by a passing cosmic ray. How can we 
do this without transmitting the whole program to or from Galileo ? 

Here is one way. The program we want to check (say N\) and the correct program on Earth 
(say N2) can be regarded as multiple-precision integers. Choose a random prime number p in 
the interval (10 9 ,2 x 10 9 ). Transmit p to Galileo and ask it to compute 

r\ <— N\ mod p 

and send it back to Earth. Only a few bits (no more than 64 for p and r±) need be transmitted 
between Earth and Galileo, so we can afford to use good error correction/detection. 

On Earth we compute T2 N2 modp, and check if r\ = T2- There are two possibilities: 
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• r i 7^ r 2- We conclude that N\ 7^ AT 2 . Galileo's program has been corrupted ! If there are 
only a small number of errors, they can be localised by binary search using 0(loglog N\) 
small messages. 

• r\ = r-2- We conclude that Galileo's program is probably correct. More precisely, if Galileo's 
program is not correct there is only a probability of less than 10~ 9 that r\ = r 2 , i.e. that 
we have a "false positive" . If this probability is too large for the quality-assurance team to 
accept, just repeat the process (say) ten times with different random primes V\iVii ■ ■ ■ >.Pio- 
If Ni ^ N2, there is a probability of less than 

1Q -90 

that we get r\ = r 2 ten times in a row. This should be good enough. 

The problem and its solution were communicated to me by Michael Rabin, who called it the 
"Library of Congress on Mars" problem. 

The Structure 

Our procedure has the following form. We ask a question with a yes/no answer. The precise 
question depends on a random number. If the answer is "no", we can assume that it is correct. 
If the answer is "yes" , there is a small probability of error, but we can reduce this probability to 
a negligible level by repeating the procedure a few times with independent random numbers. 

We call such a procedure a probabilistic algorithm; other common names are randomised 
algorithm and Monte Carlo algorithm. 

Disclaimer 

It would be much better to build error correcting hardware into Galileo, and not depend on 
checking from Earth. 

2 Testing Primality 

Here is another example 1 with the same structure. We want an algorithm to determine if a 
given odd positive integer n is prime. Write n as 2 k q + 1, where q is odd and k > 0. 

Algorithm P 

1. Choose a random integer x in (l,n). 

2. Compute y = x q mod n. This can be done with 0(logq) operations mod n, using the 
binary representation of q. 

3. If y = 1 then return "yes". 

4. For j = 1,2, ... ,k do 

if y = n — 1 then return "yes" 
else if y = 1 then return "no" 
else y <— y 2 mod n. 

5. Return "no". 

^ue to M. O. Rabin [49], with improvements by G. L. Miller. See Knuth [28]. 
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Fermat's Little Theorem 

To understand the mathematical basis for Algorithm P, recall Fermat's little Theorem: 
if n is prime and < x < n, then 

x 11 ^ 1 = 1 mod n. 

Thus, if x n ~ l 7^ 1 mod n, we can definitely say that n is composite. 

Unfortunately, the converse of Fermat's little theorem is false: if x n ~ l = 1 mod n we can not 
be sure that n is prime. There are examples (called Carmichael numbers) of composite n for 
which x n ~ x is always 1 mod n when GCD(x,n) = 1. The smallest example is 

561 = 3- 11 -17 

Another example is 2 

n = 1729 = 7-13-19 

An Extension 

A slight extension of Fermat's little Theorem is useful, because its converse is usually true. 
If n = 2 k q + 1 is an odd prime, then either x q = 1 mod n, or the sequence 

( x 2Jq mod n) 

\ Jj=0,l,...,k 

ends with 1, and the value just preceding the first appearance of 1 must be n — 1. 

Proof: If y 2 = 1 mod n then n\(y — l)(y + 1). Since n is prime, n\(y — 1) or n\(y + 1). Thus 
y = ±1 mod n. □ 

The extension gives a necessary (but not sufficient) condition for primality of n. Algorithm P 
just checks if this condition is satisfied for a random choice of x, and returns "yes" if it is. 

Reliability of Algorithm P 

Algorithm P can not give false negatives (unless we make an arithmetic mistake), but it can 
give false positives (i.e. "yes" when n is composite). However, the probability of a false positive 
is less than 1/4. (Usually much less - see Knuth [28], ex. 4.5.4.22.) Thus, if we repeat the 
algorithm 10 times there is less than 1 in 10 6 chance of a false positive, and if we repeat 100 
times the results should satisfy anyone but a pure mathematician. 

Algorithm P works fine even if the input is a Carmichael number. 

Use of Randomness 

Note that in both our examples randomness was introduced into the algorithm. 
We did not make any assumption about the distribution of inputs. 

Summary of Algorithm P 

Given any e > 0, we can check primality of a number n in 

0((logn) 3 log(l/e)) 

bit-operations 3 , provided we are willing to accept a probability of error of at most e. 
By way of comparison, the best known deterministic algorithm takes 

O((logn) clogl ° slogri ) 

bit-operations, and is much more complicated. If we assume the Generalised Riemann Hypoth- 
esis, the exponent can be reduced to 5. (But who believes in GRH with as much certainty as 
Algorithm P gives us ?) 

2 Hardy's taxi number [23], 1729 = 12 3 + l 3 = 10 3 + 9 3 . 

3 We can factor n deterministically in O(logn) arithmetic operations [56], but this result is useless because the 
operations are on numbers as large as 2 n . Thus, it is more realistic to consider bit-operations. 
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3 Error-Free Algorithms 

The probabilistic algorithms considered so far {Monte Carlo algorithms) can give the wrong 
answer with a small probability. There is another class of probabilistic algorithms (Las Vegas 
algorithms) for which the answer is always correct; only the runtime is random 4 . An interesting 
example is H. W. Lenstra's elliptic curve method (ECM) [36] for integer factorisation. To avoid 
trivial cases, suppose we want to find a prime factor p > 3 of an odd composite integer N. 

To motivate ECM, consider an earlier algorithm, Pollard's u p — 1" method. This works if 
p — 1 is "smooth", i.e. has only small prime factors, p — 1 is important because it is the order 
of the multiplicative group G of the field F p . The problem is that G is fixed. 

Lenstra's Ldea 

Lenstra had the idea of using a group G(a,b) which depends on parameters (a, b). By 
randomly selecting a and b, we get a large set of different groups, and some of these should have 
smooth order. 

The group G(a, b) is the group of points on the elliptic curve 



The distribution in this interval is not uniform, but it is "close enough" to uniform for our 
purposes. 

Runtime of ECM 

Under plausible assumptions ECM has expected run time 



where c ~ 2. 

Note that T depends mainly on the size of p, the factor found, and not very strongly on N. 
In practice the run time is close to an exponentially distributed random variable with mean and 
variance about T. 

ECM Example 

ECM is the best known algorithm for finding moderately large factors of very large numbers. 

Consider the 617-decimal digit Fermat number Fn = 2 2 " + 1. Its factorisation is: 
F u = 319489 • 974849 • 167988556341760475137 • 3560841906445833920513 • p 564 , 

where p^4 is a 564-decimal digit prime. 

In 1989 I found the 21-digit and 22-digit prime factors using ECM. The factorisation required 
about 360 million multiplications mod N, which took less than 2 hours on a Fujitsu VP 100 
vector processor. 

4 In practical cases the expected runtime is finite. It is possible that the algorithm does not terminate, but 
with probability zero. 

5 The "Riemann hypothesis for finite fields". G(a,b) is known as the "Mordell-Weil" group. The result on its 
order follows from a theorem of Hasse (1934), later generalised by A. Weil and Deligne (see [34]). 



y 2 = x 3 + ax + b mod p, 



and by a famous theorem 5 the order of G(a, b) is an integer in the interval 



(P-1-2VP, P-1 + 2VP) 
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4 Minimal Perfect Hashing 

Hashing is a common technique used to map words into a small set of integers (which may- 
then be used as indices to address a table). Thus, the computation n «— N\ modp used in our 
"Galileo" example can be considered as a hash function. 

Formally, consider a set 

W = {W ,W!,.. . ,W m -i] 

of m words Wj, each of which is a finite string of symbols over a finite alphabet X. A hash 
function is a function 

h:W->I, 

where / = {0, 1, . . . , k — 1} and is a fixed integer (the table size). 
Collisions 

A collision occurs if two words w\ and w 2 map to the same address, i.e. if h(w\) = h(w 2 ). 
There are various techniques for handling collisions [29]. However, these complicate the algo- 
rithms and introduce inefficiencies. In applications where W is fixed (e.g. the reserved words in 
a compiler), it is worth trying to avoid collisions. 

Perfection 

If there are no collisions, the hash function is called perfect. 
Minimal Perfection 

For a perfect hash function, we must have k > m. If k = m the hash function is minimal. 
Problem 

Given a set W, how can we compute a minimal perfect hash function ? 
The CHM Algorithm 

Czech, Havas and Majewski (CHM) [14] give a probabilistic algorithm which runs in expected 
time 0(m) (ignoring the effect of finite word- length) . Their algorithm uses some properties of 
random graphs. 

Take n = 3m, and let 

V = {l,2,...,n}. 
CHM take two independent pseudo-random functions 6 

h:W^V, f 2 :W^V, 

and let 

E = {(f 1 (w)J 2 (w)) \weW}. 

We can think of G = (V, E) as a random graph with n vertices V and (at most) m edges E. 
Acyclicity 

If G has less than m edges or G has cycles, CHM reject the choice of fi,f 2 and try again. 
Eventually they get a graph G with m edges and no cycles. Beca use n = 3m, the expected 
number of trials is a constant (about v^, or more generally y / n _^ m ; for large m and n > 2m). 

6 How can this be done ? This is a theoretical weak point of the algorithm, but in practice the solution given 
in [14] is satisfactory. 
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The Perfect Hash Function 

Once an acceptable G has been found, it is easy to compute (and store in a table) a function 

g:V->Q,l,...,m — l 

such that 

h(w) = g(fi(w)) + #(/ 2 (») mod m 
is the desired minimal perfect hash function. We can even get 

h(wj) = j 

for j = 0, 1, . . . , m — 1. All this requires is a depth-first search of G. 
Implementation 

CHM report that on a Sun SPARCstation 2 they can generate a minimal perfect hash 
function for a set of m = 2 19 words in 33 seconds. Earlier algorithms required time which (at 
least in the worst case) was an exponentially increasing function of m, so could only handle very 
small m. 

5 Permutation Routing 

A network G is a connected, undirected graph with iV vertices 0, 1, . . . , N — 1. 

The permutation routing problem on G is: given a permutation ir of the vertices, and a 
message (called a packet) on each vertex, route packet j from vertex j to vertex It is 

assumed that at most one packet can traverse each edge in unit time, and that we want to 
minimise the time for the routing. 

In practice we only want to consider oblivious algorithms, where the route taken by packet 
j depends only on (j, 7r(j)). 

For simplicity, assume that the G is a <i-dimensional hypercube, so A = 2 d . Similar results 
apply to other networks. 

Example: Leading Bit Routing 

A simple algorithm for routing packets on a hypercube chooses which edge to send a packet 
along by comparing the current address and the destination address and finding the highest 
order bit position in which these addresses differ. 

For example, consider the bit-reversal permutation 01001001 — > 10010010. Each "|" corre- 
sponds to traversal of an edge in the hypercube. 
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Borodin and Hopcroft's bound 

The following result [8] says that there are no "uniformly good" deterministic algorithms for 
oblivious permutation routing: 

Theorem: For any deterministic, oblivious permutation routing algorithm, there is a permutation 
7r for which the routing takes Q(\jN/cP) steps. 

Example: For the leading-bit routing algorithm, take it to be the bit-reversal permutation, i.e. 

ir(b bi . . . b d _i) = b d ^i . . . Mo • 

Suppose d is even. Then at least 2 d l 2 packets are routed through vertex 0. To prove this, 
consider the routing of 

xx . . . xxOO ... 00 , 
where there are at least d/2 trailing zeros. 
Valiant and Brebner's algorithm 

We can do much better with a probabilistic algorithm. Valiant suggested: 

1. Choose a random mapping a (not necessarily a permutation). 

2. Route message j from vertex j to vertex a(j) using the leading bit algorithm (for < j < 
AT). 

3. Route message j from vertex a{j) to vertex 

This seems crazy 7 , but it works ! Valiant and Brebner [60] prove: 

Theorem: With probability greater than 1 — 1/N, every packet reaches its destination in at most 
14d steps. 

Corollary: The expected number of steps to route all packets is less than 15d. 

6 Pseudo-deterministic Algorithms 

Some probabilistic algorithms use many independent random numbers, and because of the "law 
of large numbers" their performance is very predictable. One example is the multiple-polynomial 
quadratic sieve (MPQS) algorithm for integer factorisation. 

Suppose we want to factor a large composite number N (not a perfect power). The key idea 
of MPQS is to generate a sufficiently large number of congruences of the form 

v 2 =pT---pT mod ^. 

where pi,---,Pk ar e small primes in a precomputed "factor base" , and y is close to y/N. Many 
y are tried, and the "successful" ones are found efficiently by a sieving process. 

Making some plausible assumptions, the expected run time of MPQS is 

T = O (exp ( yjc log N log log N ) ) , 

where c ~ 1. In practice, this estimate is good and the variance is small. 

7 I do not know of any manufacturer who has been persuaded to implement it. Probably it would be hard to 
sell. 
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MPQS Example 

MPQS is currently the best general-purpose algorithm for factoring moderately large num- 
bers N whose factors are in the range iV 1 / 3 to TV 1 / 2 . For example, A. K. Lenstra and M. S. Man- 
asse recently found 

3 329 + 1 = 2 2 • 547 • 16921 • 256057 • 36913801 • 177140839 • 1534179947851 • p 50 • p 67 , 
where the penultimate factor p$o is a 50-digit prime 

24677078822840014266652779036768062918372697435241, 

and the largest factor p^ is a 67-digit prime. 

The computation used a network of workstations for "sieving" , then a super-computer for 
the solution of a very large linear system. 

A "random" 129-digit number (RSA129) has just been factored in a similar way to win a 
$100 prize offered by Rivest, Shamir and Adleman in 1977. 

7 Complexity Theory of Probabilistic Algorithms 

Do probabilistic algorithms have an advantage over deterministic algorithms ? If we allow a 
small probability of error, the answer is yes, as we saw for the Galileo example. If no error is 
allowed, the answer is (probably) no. 

A. C. Yao considered probabilistic algorithms (modelled as decision trees) for testing proper- 
ties P of undirected graphs (given by their adjacency matrices) on n vertices. He also considered 
deterministic algorithms which assume a given distribution of inputs (i.e. a distribution over the 
set of graphs with n vertices). 

Definitions 

Yao defines 

randomized complexity Fr(P) as an 

infimum (over all possible algorithms) of a 

maximum (over all graphs with n vertices) of the 
expected runtime. 

and 

distributional complexity Fu(P) as a 

supremum (over input distributions) of a 

minimum (over all possible deterministic algorithms) of the 
average runtime. 

Informally, Fr(P) is how long the best probabilistic algorithm takes for testing P; and Fu(P) 
is the average runtime we can always guarantee with a good deterministic algorithm, provided 
the distribution of inputs is known. 

Yao's Result 

Yao (1977) claims that Fd(P) = Fr(P) follows from the minimax theorem of John von 
Neumann (1928). The minimax theorem is familiar from the theory of two-person zero-sum 
games. 
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So What ? 



Yao's result should not discourage the use of probabilistic algorithms - we have already given 
several examples where they out-perform known deterministic algorithms, and there are many 
similar examples. 

Yao's computational model is very restrictive. Because n is fixed, table lookup is permitted, 
and the maximum complexity of any problem is 0(n 2 ). 

Adleman and Gill's result 

Less restrictive models have been considered by Adleman and Gill. Without going into 
details of the definitions, they prove: 

Theorem: If a Boolean function has a randomised, polynomial-sized circuit family, then it has a 
deterministic, polynomial-sized circuit family. 

There are two problems with this result: 

• The deterministic circuit may be larger (by a factor of about n, the number of variables) 
than the original circuit. 

• The transformation is not "uniform" - it can not be computed in polynomial time by a 
Turing machine. The proof of the theorem is by a counting argument applied to a matrix 
with 2 n rows, so it is not constructive in a practical sense. 

8 The Class RP 

We can formalise the notion of a probabilistic algorithm and define a class RP of languages L 
such that x G L is accepted by a probabilistic algorithm in polynomial time with probability 
p > 1/2 say 8 , but x ^ L is never accepted. Clearly 

P C RP C NP, 

where P and NP are the well-known classes of problems which are accepted in polynomial time 
by deterministic and nondeterministic (respectively) algorithms. 

It is plausible that 

P c RP c NP, 

but this would imply that P / NP, so it is a difficult question. 

9 Perfect Parties 

B. McKay (ANU) and S. Radziszowski are interested in the size of the largest "perfect party". 
Because people at parties tend to cluster in groups of five, we consider a party to be imperfect if 
there are five people who are mutual acquaintances, or five who are mutual strangers. A perfect 
party is one which is not imperfect. 

McKay et al have performed a probabilistic computation which shows that, with high prob- 
ability, the largest perfect party has 42 people. 

Ramsey Numbers 

R(s, t) is the smallest n such that each graph on n or more vertices has a clique of size s or 
an independent set of size t. 

8 Any fixed value in (0, 1) can be used in the definition. 
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Examples: 72(3,3) = 6, #(4,4) = 18, R(4,S) = 25, and 43 < #(5,5) < 49. See [38, 39]. 
Perfect party organisers would like to know R(5, 5) — 1. 
The Computation 

A (5, 5, n)-graph is a graph with n vertices, no clique of size 5, and no independent set of 
size 5. There are 328 known (5, 5, 42)-graphs, not counting complements as different. McKay 
et al generated 5812 (5, 5, 42)-graphs using simulated annealing, starting at random graphs. All 
5812 turned out to be known. 

If there were any more (5, 5, 42)-graphs, and if the simulated annealing process is about 
equally likely to find any (5, 5, 42)-graph 9 , then another such graph would have been found with 
probability greater than 

0.99999998 

Thus, there is convincing evidence that all (5, 5, 42)-graphs are known. None of these graphs 
can be extended to (5, 5, 43)-graphs. Thus, it is very unlikely that such a graph exists, and it is 
very likely that 

i?(5,5) - 1 = 42 

A Rigorous Proof ? 

A rigorous proof that R(5, 5) — 1 = 42 would take thousands of years of computer time 10 , 
so the probabilistic argument is the best that is feasible at present, unless we can get time on a 
computer as fast as Deep Thought [1]. 

10 Omissions 

We did not have time to mention applications of randomness to serial or parallel algorithms for: 

• sorting and selection, 

• computer security, 

• cryptography, 

• computational geometry, 

• load-balancing, 

• collision avoidance, 

• online algorithms, 

• optimisation, 

• numerical integration, 

• graphics and virtual reality, 

• avoiding degeneracy, 

• approximation algorithms for NP-hard problems, 

9 There is no obvious way to prove this, so the probability estimate is not rigorous. 
10 Based on the fact that it took seven years of Sparcstation time to show that 7?(4, 5) = 25. 
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and many other problems. References to most of these applications are given in the bibliography 
below (see for example [42, 53]). 

Another Omission 

We did not discuss algorithms for generating pseudo-random numbers - that would require 
another talk 11 . 

11 Conclusion 

• Probabilistic algorithms are useful. 

• They are often simpler and use less space than deterministic algorithms. 

• They can also be faster, if we are willing to live with a minute probability of error. 

Some Open Problems 

• Give good lower bounds for the complexity of probabilistic algorithms (with and without 
error) for interesting problems. 

• Show how to generate independent random samples from interesting structures (e.g. finite 
groups defined by relations, various classes of graphs, . . .) to provide a foundation for 
probabilistic algorithms on these structures. 

• Consider the effect of using pseudo-random numbers instead of genuinely random numbers. 

• Extend Yao's results to a more realistic model of computation. 

• Give a uniform variant of the Adleman-Gill theorem. 

• Show that P^RP (hard). 
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